Introduction to nonlinear optimization using factor graphs

Non-linear optimization of sparse problems is well explained by describing the problem as a factor graph. A factor graph looks like this,

Factor graph

Toy example of a factor graph. The round nodes are state blocks, labelled 0 to 7. The square nodes are factors encoding measured relations between the state blocks, labelled 1 to 10.

Sparsity: state blocks and factors

In a factor graph, the system’s state vector is partitioned into blocks of reasonably small size, called state blocks.

Each measurement made on the system depends on a small handful of state blocks. These are constituted as factors.

For each factor, a function of this subgroup of state blocks and the measurement is used to compute a residual (otherwise known as ‘expectation error’, and equivalent to what is known in the filtering literature as ‘innovation’).

The optimizer or Solver then tries to find the overall state that minimizes the overall residual, in a least squares sense.

The result is the optimal state, which constitutes the best available solution to the problem.

Factor residuals

To make sparsity visible and exploitable, let us partition the state vector into state blocks,

\[X = {x_1,..., x_N}\]

Let us define the set of factors

\[C = { C_1, ..., C_K}\]

relating subsets of these blocks, \(X_k= { x_{k_1},..., x_{k_n}}\) to measurements \(y_k\)

Each factor \(C_k\) computes an estimation error \(e_k\),

\[e_k = e_k( X_k , y_k )\]

Note

For example, imagine the 4th measurement \(y_4\) is a function of:

  • the robot position \(x_1\),

  • the robot orientation \(x_2\),

  • the sensor position \(x_5\) and orientation \(x_6\) in the robot, and

  • the position of a landmark in the environment, \(x_{17}\).

Then the error function above reads,

\[e_4 = e_4 ( x_1, x_2, x_5, x_6, x_{17}, y_4)\]

Let us also define the amount of information provided by the measurement \(y_k\) of this factor through an information matrix, \(W_k\). This is the inverse of the covariances matrix of the measurement.

Optimal state

Then, the optimal state \(x^*\) is the solution to the least-squares minimization problem,

\[x^* = \textrm{argmin}_x \sum_k (e_{k}(x)^\top \cdot W_k \cdot e_k(x) )\]

In WOLF, we define the Residual computation as being the estimation error weighted by its information matrix. This we do by realizing that, if we factorize \(W=W^{1/2}W^{\top/2}\), then

\[e_{k}(x)^\top \cdot W_k \cdot e_k(x) = e_{k}(x)^\top \cdot W_k^{1/2} \cdot W_k^{\top/2} \cdot e_k(x) = r_{k}(x)^\top \cdot r_k(x)\]

obtaining a definition of the residual that is well a weighted estimation error,

\[r_k(x) \triangleq W_k^{\top/2} \cdot e_k(X_k, y_k)\]

Note

In WOLF (see the WOLF tree),

  • the residual evaluation function is stored in the class Factor.

  • the measurement \(y_k\), its covariances matrix \(Q_k\), and the square root of its information matrix \(W_k^{\top/2}\), are stored in the Feature.

Iterative optimization via linearizations

In general, non-linear optimizers are here referred to as solvers, because they are the ones responsible for finding the state that minimizes the residual, i.e., the optimal state.

Solvers for non-linear problems proceed by

  1. Linearize all the factors.

For each factor \(k\):

  • Evaluate the residual (we consider it depends only on two state blocks indexed by i and j)

    \[r_k = W^{\top/2} e_k (x_i, x_j, y_k)\]
  • Evaluate the Jacobians wrt. each concerned state block,

    \[J^k_i = \frac{\partial r_k}{\partial x_i} \quad , \quad J^k_j = \frac{\partial r_k}{\partial x_j}\]
  • Append Jacobians and residual to a sparse full-size Jacobian matrix \(A\) and residual vector \(b\), matching rows and columns with what corresponds to state indices \(i,j\) and factor indices \(k\).

    \[A_ki = J^k_i \quad , \quad A_kj = J^k_j \quad , \quad b_k = r_k\]
  1. Compute an optimal step correction \(dx\) of the linearized system

  1. Update the state \(x\) with this correction step

\[x \gets x \oplus dx\]

where \(\oplus\) is the composition law of each state blocks’s manifold. In effect, the update is made state block by state block:

  • For plain vectors, it is a regular sum

    \[v \oplus \delta v = v + \delta v\]
  • For rotations, we use the exponential map

    \[R \oplus \delta \theta = R \cdot \exp (\delta \theta)\]
  1. Iterate from 1. until convergence

Note

In practice, operations 2, 3 and 4 are relegated to the Solver.

Regarding 1, at each iteration the solver calls WOLF each time it needs to linearize a new factor.

The factor’s math (residual and Jacobians) is therefore implemented in the WOLF classes deriving from FactorBase.

These maths are in the majority of cases very simple.

The power of graph-based optimization (and WOLF for that sake) is that, by properly organizing a large and heterogeneous number of these simple factors, very large and complex problems can be solved efficiently.